Golang Job: Lead Site Reliability Engineer (OnePass)

Job added on

Location

Melbourne - Australia

Job type

Full-Time

Golang Job Details

Lead Site Reliability Engineer

The Lead Site Reliability Engineer will be focused on developing our SRE Practices and uplifting our observability capability. You will be working with our product development teams in uplifting our existing systems and practices, measuring the quality of experience our customers have across the OnePass products. This is an individual contributor (IC) role.

Key Duties and Deliverables:

  • Rollout and enhance our observability tooling to measure the performance and reliability of our systems
  • Help develop our SRE practices and contribute to our product roadmap with the wider Platform team
  • Identify and remediate the weaker points of our architecture, using modern fault injection methodologies
  • Drive operational excellence and SRE best practices across the engineering group
  • Coach and mentor teams in uplifting their use of SRE Practices
  • Plan “game days” to run chaos experiments for teams across our platform
  • Contribute to our team planning, present work completed at showcase and provide continuous feedback to your team and peers
  • Work with key product and technology stakeholders to demonstrate the benefits of SRE

What You Will Need

  • Prior experience working as a Site Reliability Engineer (or similar role)
  • Experience or familiarity with designing and building distributed systems using event sourcing / event-driven architecture (Kafka ideally) and/or API’s:
  • Experience with one or more container scheduling/orchestration products - Kubernetes, ECS etc
  • Experience building systems on any public cloud provider
  • Excellent communication skills. You should be comfortable engaging with developers, architects, product owners and be able to articulate the benefits of SRE Practices.
  • Experience with at least one observability product (eg. Datadog, Newrelic) and a solid understanding of distributed tracing
  • Experience implementing performance commitments for products and experiences (eg SLA/SLO/SLIs)
  • Familiar with different disaster recovery strategies, load balancing & circuit breaking
  • Proficiency in Golang (nice to have)

Why OneDigital?

  • Working in a community of industry-leading innovators with a diverse and deep set of skills and experience, you will learn, collaborate, and co-create to achieve great things.
  • You’ll have ownership of your role, which will allow you to find the right balance between stretch and sustainability, work, and life.
  • Culture of comradery and collaboration - OneTeam
  • Plus, all the tools and learning you need, the tone is set for you to shine and succeed.
  • We know that diversity fosters greater innovation and better customer connection, so we strive to create a team where everyone feels like they belong. We support diversity, inclusion and we are a gender-neutral organisation. We celebrate individuals at their core, so they shine to their best.


Team Benefits – just to name a few!

To support you at work, what you value in life and in the community, our team member benefits include:

  • Access to the latest technologies
  • An annual budget for your learning and development (this includes learning of your choice, certifications and more)
  • Extended parental leave (16 weeks for primary carers), and paid volunteer days
  • Flexible working – we balance working together with working flexibly
  • People-focussed culture that celebrates achievements big and small
  • Oh, and did I mention paid subscriptions and discount cards.

Next steps

If this sounds like your next career move, then click on the ‘Apply’ button now. Please note that we may commence interviewing candidates prior to the application closing date.